Task Loss Estimation for Sequence Prediction
نویسندگان
چکیده
Often, the performance on a supervised machine learning task is evaluated with a task loss function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a surrogate loss function, such as for instance cross-entropy or hinge loss. In order for this remedy to be effective, it is important to ensure that minimization of the surrogate loss results in minimization of the task loss, a condition that we call consistency with the task loss. In this work, we propose another method for deriving differentiable surrogate losses that provably meet this requirement. We focus on the broad class of models that define a score for every input-output pair. Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss. A distinct feature of such an approach is that it defines the desirable value of the score for every input-output pair. We use this property to design specialized surrogate losses for Encoder-Decoder models often used for sequence prediction tasks. In our experiment, we benchmark on the task of speech recognition. Using a new surrogate loss instead of cross-entropy to train an Encoder-Decoder speech recognizer brings a significant 9% relative improvement in terms of Character Error Rate (CER) in the case when no extra corpora are used for language modeling.
منابع مشابه
Seismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task
In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...
متن کاملBayes, E-Bayes and Robust Bayes Premium Estimation and Prediction under the Squared Log Error Loss Function
In risk analysis based on Bayesian framework, premium calculation requires specification of a prior distribution for the risk parameter in the heterogeneous portfolio. When the prior knowledge is vague, the E-Bayesian and robust Bayesian analysis can be used to handle the uncertainty in specifying the prior distribution by considering a class of priors instead of a single prior. In th...
متن کاملSTATISTICAL PREDICTION OF THE SEQUENCE OF LARGE EARTHQUAKES IN IRAN
The use of different probability distributions as described by the Exponential, Pareto, Lognormal, Rayleigh, and Gama probability functions applied to estimation the time of the next great earthquake (Ms≥6.0) in different seismotectonic provinces of Iran. This prediction is based on the information about past earthquake occurrences in the given region and the basic assumption that future seismi...
متن کاملA method of performance estimation for axial-flow turbines based on losses prediction
The main objective in this paper is creating a method for one-dimensional modeling of multi stage axial flow turbine. The calculation used in this technique is based on common thermodynamics and aerodynamics principles in a mean stream line analyses. In this approach, loss models have to be used to determine the entropy increase across each section in the turbine stage. Finally, the analysis an...
متن کاملTighter Bounds for Structured Estimation
Large-margin structured estimation methods work by minimizing a convex upper bound of loss functions. While they allow for efficient optimization algorithms, these convex formulations are not tight and sacrifice the ability to accurately model the true loss. We present tighter non-convex bounds based on generalizing the notion of a ramp loss from binary classification to structured estimation. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1511.06456 شماره
صفحات -
تاریخ انتشار 2015